Fix memory leaks and performance improvements#22
Conversation
|
|
||
| parallel_for(0, G->numCell(), [&](intT i) { | ||
| if (ccFlag[i]) { | ||
| trees[i] = new treeT(G->getCell(i)->getItem(), G->getCell(i)->size(), false); |
There was a problem hiding this comment.
Major leak: hasEdge duplicates some new treeT objects without removing them. Fix - preallocate them. Note - I had trouble using locks here.
Thank you. In hasEdge, the trees were allocated on demand to save memory in case they are not required. How are the trees getting duplicated without removed? From concurrent calls to hasEdge?
There was a problem hiding this comment.
I think what was happening was that in hasEdge
if (!trees[n1])
trees[n1] = new treeT(cells[n1].getItem(), cells[n1].size(), false);//todo allocation, parallel
if (!trees[n2])
trees[n2] = new treeT(cells[n2].getItem(), cells[n2].size(), false);//todo allocation, parallel
while one thread is doing new treeT, another thread sees trees[x] is empty, so it tries to do the same. Both threads assign the new treeT to the same index, one of them leaks.
This fix can be improved to work as intended, but currently it works, and speed is improved due to threads not doing duplicate work, so I moved on to fixing other areas.
| floatT hop = sqrt(dim + 3) * 1.0000001; | ||
| nbrCache[bait-cells] = tree->rangeNeighbor(bait, r * hop, fStop, fWrap, true, nbrCache[bait-cells]); | ||
| // wait for other threads to do their thing then try again | ||
| std::lock_guard<std::mutex> lock(cacheLocks[idx]); |
There was a problem hiding this comment.
Minor leak: Multiple threads cache the same data in rangeNeighbor at once, overwriting each other. Fix - the use of Locks.
Thank you. Did you notice any performance degradation as a result of using mutex?
There was a problem hiding this comment.
I did not notice a change in the performance. Threads should now be doing less unnecessary work, which should overcome the performance cost of mutex.
| @@ -4,6 +4,27 @@ | |||
| #include "dbscan/pbbs/parallel.h" | |||
|
|
|||
|
|
|||
There was a problem hiding this comment.
No problem! I also have a fix for Arm CPU crashing, which I will upload later. There is also a very rare segmentation fault (~1 in a million chance) that I am looking into.
Fixes
Major leak:
hasEdgeduplicates somenew treeTobjects without removing them. Fix - preallocate them. Note - I had trouble using locks here.Major leak: Not using
Py_DECREF.Minor leak: Multiple threads cache the same data in
rangeNeighborat once, overwriting each other. Fix - the use of Locks.Performance optimization: The Scheduler gets initialized every
DBSCANcall. Fix: call it once.For performance and memory testing, I used a point cloud of shape (12895, 2) with unique labels: [-1 0 1 2 3 4 5 6] in an n=2000 for loop.
A final test of n=60000 proved no increase in memory by Heaptrack. Total performance gain 1.72x.